119354-t4-hour-hotfix-that-didnt-happen
Content ---- ---- ---- ---- ---- ---- Right, and if they did that, people would be crying on how Carbine says every time the maintenance is supposed to take 6h and only took 2h. | |} ---- ---- ---- When has Carbine kept to what they say? They contradict themselves frequently :blink: But i'd rather they put things back instead of rushing through it and releasing it with even more bugs like they did with drop 3. Im crossing my fingers and hoping the server latency on Jabbit was one of the issues they were trying to resolve. Im not sure how much more of that i can take before i rage quit and delete the client from my PC. | |} ---- ---- Crap response. The person is well within their right to be a bit peeved about this. Being kept on a shoe string means you can't just arrange something that you could have done if you'd known you had 4-6 hours and not 30 mins to 1 hour. | |} ---- Unexpected events are, well, unexpected. That means that you don't know for certain how long they are going to take to resolve in advance. If it had been expected, well, then it would have been included in the first estimate. It's like suddenly getting stuck in traffic due to an accident, flooding, or something like that, and you call home to say you are going to be late for dinner. "Sorry, I'm going to be late! How late? Well, looks like about an hour..." An hour later, the traffic jam has still not cleared, and you have to call home again and give a new estimate... | |} ---- ---- ---- ---- ---- You have no clue about server deployments on this scale... It's not that easy as "just roll it back" | |} ---- ---- ---- Working for a global company, I think you'll find I do. There should be rollback procedures in place, not just a mess of hands going in changing things to plug the gaps as it goes in or to get it back out. You don't work at Carbine to my knowledge so you can't comment on how easy or hard it is to do, fact is the rollback should always be prepared for every release and a window defined for when the initiation of a rollback is due to occur so you don't get issues where you are constantly saying "oh a few more minutes to fix this, damn it's broke something else". You have a defined window and if it doesn't work you go and fix what's broken because you can't afford to take a Live system down. | |} ---- Working at a large game company, I CAN comment on how these things work and do. And trust me when I say, that software related to games does not work similar to your "global company software" | |} ---- No, I don't think there's any effect at all if you didn't patch. It's exactly the same version as before the downtime, so nothing should appear changed. | |} ---- ---- ---- newsflash : Game servers and software are not "basic things". You cannot just ignore the fact you're dealing with a cluster or shard-setup. These things are delicated, and simply initiating your deploy in the wrong order can destroy your environment. So again, this is not as easy as you think it is. | |} ---- Yes, that's why you have to have the rollback tested and in place and in the correct order. Just like other complex systems, you do things in the wrong order and it will break stuff and not work. That's why the rollback should be tested beforehand. I've not once claimed it's easy, it's complex but it should be in place if it's required. The people are good enough to write a game, they're good enough to write a rollback. | |} ---- Even if this is tested in staging environment, you have no fool proof guarantee that it works 100% on your live environment. Working for a "global company", I'm sure you know this already. | |} ---- ---- Dude...rolling back is not even remotely close to "releasing code". Whatever your patch done, is not even guaranteed to be reverseable. | |} ---- How do you know they don't have a rollback in place for when/if it's required? A hotfix "OOPS!", IMNSHO, doesn't seem like a thing that needs the rollback used for. Hell, if they didn't use rollbacks for the exploits, why the heck should they use them for a hotfix "OOPS!" :P So we lost a couple of hours of gameplay this morning, it's not like that isn't unexpected in MMO's. And was quite short compared to other times/games this happens with. | |} ---- I didn't say they don't have, I hope they do. What I said was they should maybe be a bit more stubborn with when they initiate them. Purely in the context of what the OP is talking about and stringing people along if the downtime is set to last 1 hour then you know your rollback is 30 mins, after 30 mins if it's not working take it back out. Alternatively, give a message out saying apologies there's an issue, we're extending for 1 hour and if you haven't solved it in 1 hour then roll it back. Don't do the same over and over extending and extending. | |} ---- Like cutting a cable in half when you don't have another in the cupboard? Enlighten me, what can you do that you can't put back. If you don't plan for it you're right, if you know what you're changing you sure as hell can put it back. | |} ---- ---- do you honestly think deploying a new server patch is in the lines of "git pull"? You're talking configuration changes, database updates, Windows update patches (goodluck rolling that one back without trouble), external dependencies that maybe got updated and broke something for you. Let's take a "global company" one: Upgrading from .NET 3.5 to .NET 4 | |} ---- Why can't you do this? I fully understand what you're saying that there's tons of variables etc and it's a royal pain in the backside and things will get missed and involve loads of teams and people but all of the external libraries, server patches etc etc can and should be tested. I also accept that CRB is not massive and doesn't seem to have the infrastructure for a second PTR or Live proving platform. I didn't come here to start flinging mud at people anyway like Intensive so I'll just leave it at I thought your post to the OP was a bit harsh considering he/she was sat there waiting for the server to come up and it frustrated them. | |} ---- Chua sorry | |} ---- If I have to explain this to you, I highly doubt you work at a "global company" or even have the slightest clue about server software architecture. | |} ---- That's my point, you don't need to explain it, I've done it. | |} ---- I agree completely. I haven't had the pleasure of working in the games industry, but I used to work for a software company that ran cloud servers for its multitude of clients. And I can tell you, new software versions were the most nightmarish part of the job. You can have the best testing software out there, the most dedicated QA team, the best devs and support staff. All that doesn't matter, inevitably something will slip through and you'll end up with a totally FUBAR live server. The important thing is what you do next. Carbine did what I consider to be the best actions. They put in a new patch but there were major unexpected issues, so they spent some time trying to fix them and when they couldn't they put the old build back in. I don't get why people are asking about a "rollback" because that's pretty much exactly what they did. In a rollback situation you either replace the build (sometimes manually from the command line if the server is a total goner, had some fun sleepless nights with that) or go full wipe and restore a backup. The former is the most common because it's the easiest, the latter is the nuclear option you only use when you have to because your servers are down for as long as it takes to replace the image as well as all user data. With our company's relatively simple client data and servers hosted in the same building that would be a few hours for a largish client with a few thousand machines, for a product like Wildstar I wouldn't be surprised if that took them down for a day or more. And that's assuming the backup just magically appeared a minute after the server went down for the hotfix, otherwise you're also looking at potentially hours of lost progress for players. I'm not saying it wasn't an annoying situation, I wanted to play too for the last two or three hours of it. Heck, I wouldn't even argue if people wanted a free day or two of playtime because of it, the devs are big boys and I'm sure they understand the irritation it caused to players with tight schedules or shift workers (myself included). I'm just saying don't assume there is some magic "solve everything" button that works on a massive sharded server setup like theirs. Remember that they wanted everything fixed as quick as possible too, they're not going to make extra work for themselves for no reason. | |} ---- Again this might work in your "global company" with shit that doesn't need to be compiled or a website. A game server is not a website and runs binaries. And I'm leaving it at this, as others pointed out, you have no clue about gaming installations whatsoever and are ranting about something that was handled really well by Carbine. | |} ---- They went a few hours past the time they said they would, I don't find that unreasonable at all(played games where they were down for 6+ hours, sometimes as long as 12), and IMNSHO seems a decent amount of time to know they worked their asses off on it before they know for sure it wouldn't work. For me personally, I'd rather know they'll take a couple of extra hours to figure something out instead of calling it quits in just 30-60 mins. Troubleshooting by itself can take that long, then working the fix in can add to that. Like I said, I'd rather they worked as much as they could before calling it. And I guess I'm just a bit more tolerant because this is a rare occurrence(especially for a game as young as this one is). | |} ---- ---- If you're talking ASP.net I can let is slide cause that's indeed a bit more then updating pages. If you're talking SAP, please start running. | |} ---- Because they didn't do it fast enough, is my guess. | |} ---- Well let it slide then. As I said, I haven't questioned your credentials or credibility at all, I was simply saying the initial post was a tad harsh to someone frustrated because they weren't used to MMO patches etc. | |} ---- It's ok Blizzard does it every tuesday, what should be a 4hr maintenance turns into a day long maintenance | |} ----